The Sensitivity of Topic Coherence Evaluation to Topic Cardinality
نویسندگان
چکیده
When evaluating the quality of topics generated by a topic model, the convention is to score topic coherence — either manually or automatically — using the top-N topic words. This hyper-parameter N , or the cardinality of the topic, is often overlooked and selected arbitrarily. In this paper, we investigate the impact of this cardinality hyper-parameter on topic coherence evaluation. For two automatic topic coherence methodologies, we observe that the correlation with human ratings decreases systematically as the cardinality increases. More interestingly, we find that performance can be improved if the system scores and human ratings are aggregated over several topic cardinalities before computing the correlation. In contrast to the standard practice of using a fixed value of N (e.g. N = 5 or N = 10), our results suggest that calculating topic coherence over several different cardinalities and averaging results in a substantially more stable and robust evaluation. We release the code and the datasets used in this research, for reproducibility.1
منابع مشابه
Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملConnection between Emotional Intelligence and Coherence in EFL Writing
This research is an investigation of the relationship between the emotional intelligence of a group of Iranian English language learners and the coherence of the essays they produced. A 40-item TOEFL-type test of English proficiency including items on grammar, vocabulary, reading comprehension with the Cronbach’s alpha reliability of .742, was planned to identify the proficient candidates who c...
متن کاملA Novel Image Denoising Method Based on Incoherent Dictionary Learning and Domain Adaptation Technique
In this paper, a new method for image denoising based on incoherent dictionary learning and domain transfer technique is proposed. The idea of using sparse representation concept is one of the most interesting areas for researchers. The goal of sparse coding is to approximately model the input data as a weighted linear combination of a small number of basis vectors. Two characteristics should b...
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملA review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کامل